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SUMMARY 


This  paper  treats  the  convergence  of  adaptive  LMS  filters  and,  in 
particular,  the  adaptive  line  enhancer  (ALE).  The  learning  curves  of 
such  a filter  are  a sum  of  exponentially  decaying  modes  with  time  con- 
stants given  by  the  eigenvalues  of  the  input  correlation  matrix  and  the 
relative  initial  magnitudes  given  by  the  projections  of  the  filter  on  the 
eigenvectors.  It  is  shown  that,  for  large  filter  lengths,  a simple 
correspondence  may  be  set  up  between  the  discrete  and  continuous  cases. 
Indexed  by  frequency,  the  eigenvalues  of  the  correlation  matrix  correspond 
to  the  magnitude  of  the  power  spectrum,  and  the  projections  onto  the 
eigenvectors  to  the  filter  transfer  function.  A detailed  analysis  is 
carried  out  for  single  pole  spectra  and  evaluated  through  a computer  simu- 
lation. In  general,  the  techniques  developed  provide  a physical  context, 
i.e.,  the  signal  spectrum,  in  which  to  evaluate  convergence.  Thus,  it  is 
possible,  with  varying  degrees  of  accuracy  depending  on  knowledge  of  the 
input  spectrum,  to  predict  the  convergence  behavior  of  the  system  in 
general. 
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I. 


INTRODUCTION 


This  study  investigates  the  convergence  properties  of  LMS  adaptive 
filters  for  a wide  class  of  input  spectra,  thus  extending  some  results  of 
[I]  and  [2].  Utilization  of  the  asymptotic  properties  of  Toeplitz  matrices 
permits  the  development  of  a fairly  general  theory.  More  specific  conclu- 
sions are  reached  for  the  case  of  the  adaptive  line  enhancer  (ALE) , which 
is  used  both  as  an  introductory  example  and  a concrete  application.  In 
particular,  a detailed  analysis  is  carried  out  for  single  pole  spectra 
(i.e.,  a narrowband  signal  in  white  noise).  Several  simplified  derivations 
of  previous  results  ([2]  and  [3])  are  included  in  order  that  the  discussion 
may  remain  self-contained. 

We  begin  with  a brief  description  of  some  basic  equations  (Section  II) . 
This  is  followed  by  a treatment  of  sinusoids  in  white  noise  (Section  III), 
which  is  intended  to  provide  some  intuition  for  the  theory  of  Section  IV. 

In  Section  IV,  it  is  shown  that  for  large  filter  lengths,  the  learning  curves 
of  the  discrete  IMS  filter  may  be  approximated  by  the  continuous  case.  In 
particular,  a great  deal  may  be  said  about  the  convergence  process  without 
extensive  calculations,  simply  from  some  knowledge  of  the  input  spectrum. 

In  Section  V,  the  theory  is  applied  to  single-pole  spectra  and  evaluated 
through  a computer  simulation. 


II.  BACKGROUND 

The  convergence  time  for  a linear  time  invariant  system  depends,  in 
general,  on  the  eigenvalues  of  the  system  and  its  initial  state.  When 
adjustable  parameters  are  involved,  a coarse  indicator  of  the  optimal  con- 
vergence rate  to  be  obtained  is  the  conditioning  number;  that  is,  the  ratio 
of  the  largest  to  the  smallest  eigenvalue.  However,  inasmuch  as  suitably 
chosen  initial  conditions  can  arbitrarily  prolong  convergence,  any  further 
statement  requires  some  limitation  on  the  initial  state,  usually  taken  to 
be  zero.  This  is  not  so  restrictive  as  may  at  first  appear. 

More  precisely,  consider  the  implementation  of  the  IMS  adaptive  filter 
pictured  in  Figure  1.  For  a stationary  input  vector  X(k),  the  recursion 
equations  for  the  mean  of  the  linear  prediction  filter,  W(k),  are  given 
approximately  (W(k)  and  X(k)  are  assumed  independent  for  large  k - cf 
[1],  [4])  by  [l] 

f)(k+l)  - A»(k)  + 2pP  (1) 

where  V)(k)  ■ EW(k)  is  the  expected  value  of  the  L dimensional  weight  vector 
W(k)  at  time  k,  and 

A « I - 2pR 
- I - 2pE[XTX] 

P - E(dX)  , (2) 


1 
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with  i-  a feedback  parameter,  R the  autocorrelation  matrix  of  the  input 
process,  and  P the  cross-correlation  of  the  input  with  the  scalar  d. 


It  is  easily  seen  by  direct  substitution  that 


k-1 


»(k)  - AkW(0)  + 2p(  Z Aj)P 

J-o 


(3) 


is  a solution  to  (1).  It  follows  that  a necessary  and  sufficient  condition 
for  (1)  to  converge  for  all  initial  states  W(0)  is  that  the  eigenvalues  of  A 
have  magnitude  less  than  one;  i.e.. 


1 


max 


(4) 


where  X m maximum  eigenvalue  of  R. 
max 


In  that  case, 


W*  - lim  iJ(k) 


2p ( Z AJ)P 

J-o 

2y (I  - A)-1P 
R-1P, 


the  well-known  solution  to  the  discrete  Wiener-Hopf  equation. 


(5) 


The  terms  of  equation  (3)  represent  a decomposition  of  the  evolution 
of  9(k)  into  the  decay  of  the  old  state 


AkW<0) 


(6) 
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and  the  growth  of  the  new  state 


k-1 


2U(  T.  AJ)P. 

j=0 


(7) 


If  the  new  state  is  substantially  different  from  the  old  state  (i.e.,  if 
its  presence  is  easily  distinguished),  its  detection  will  depend  essentially 
on  expression  (7).  This,  and  considerations  of  tractability , lead  us  to 
treat  only  the  "growth  rate";  i.e.,  we  set  V)(0)=0.  Equations  (3)  and  (5) 
yield 


W*  - V?(k)  = 2p(  £ AJP) 

j-k 
= a'Sj* 


(8) 


or 


|W*  - »(k)| |2  = W*T(Aj)TAJW* 


The  symbol  T stands  for  conjugate  transpose  and 


(9) 


indicates  the  vector 


norm.  Let  A , v * 1,  . . . , L be  the  eigenvalues  and  E the  corresponding 


eigenvectors  of  R.  Then  (9)  may  be  written 

I |w*  - »(k)||2  = I (l-2pA  )2k  |EV  • W* | 2 

v*l 


(10) 


Note  that  the  represents  the  scalar  product  between  two  vectors. 
Equation  (10)  describes  the  convergence  of  V)(k)  to  W*. 
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A similar  expression  may  be  derived  for  the  convergence  of  the  mean 
2 

square  error  £(k)  “ E(e  (k))  (see  Figure  la)  to  its  limiting  value  £,* . It 
is  easily  shown  [l]  that,  in  the  absence  of  gradient  noise  (i.e.,  Lp  small), 


C(k)  - * (W*  - »(k))  R(W*  - »(k)) 


I ( l-2pA  ) 2kAv | EV* W* | 2 

v=l 


The  curves  (10)  and  (11)  as  a function  of  k will  be  termed  the  learning 


curves  of  the  filter  and  output  respectively.  Each  of  the  L terms  relaxes 

2 

geometrically  with  a constant  logarithmic  slope  of  log(l-2pA  ) . Thus,  each 


term  falls  to  e of  its  original  value  in  a time  given  by 


(log(l-2pAvy 


for  2yAv  « 1 . 


Inequality  (4)  imposes  a lower  bound  on  the  largest  time  constant 


> max 

— 4A  . 
min 


(conditioning  number) 
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Generally,  is  a very  conservative  estimate  for  the  behavior  of 

. the  learning  curves.  In  practice,  only  those  eigenvalues  for  which  the 

projection  of  the  final  state  on  the  corresponding  eigenvector  (EV,W*  for 
the  filter  or  X^EV,W*  for  the  output)  is  large  will  exert  a significant 
influence  on  convergence.  This  aspect  will  be  examined  in  detail  in 
succeeding  sections. 

' 

' 


I 

I 

r! 


i 


l 

: 
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III.  SINUSOIDS  IN  WHITE  NOISE  (ALE) 

In  this  section  we  examine  the  previous  theory  for  the  case  of  M complex 
sinusoids  in  white  noise.  It  is  shown  that  the  L-dimensional  vector  space  of 
the  input  may  be  decomposed  into  an  M-dimensional  subspace  spanned  by  the 
signal  component  and  its  orthogonal  complement.  This  permits  a simpli- 
fied calculation  of  the  eigenvalues  of  the  correlation  matrix  (equation  26). 
Then,  by  restricting  ourselves  to  the  ALE,  we  find  that  W*  lies  in  the  signal 
subspace.  It  follows  that  all  eigenvectors  outside  that  space  are  orthogonal 
to  W*  and,  hence,  their  eigenvalues  do  not  enter  into  the  learning  curves 
of  equations  (10)  and  (11) . 

One  Sinusoid 

Consider  the  case  of  a single  sine  wave  in  white  noise  x(£.)« 

/ 2~ 

v2 a cos(<j£+0)  + an N(£),  where  0 is  a uniformly  distributed  random  variable, 
s u 

and  N(£)  is  zero-mean  white  noise  with  unit  variance.  Define  the  L-dimensional 
vector  a by 

a^  - O^e1-^  SL  ■ 1,  ....  L . (14) 
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Then  the  autocorrelation  matrix  of  x is  given  by 


R = R + a I 
s o 


where 


1 , . , x)  + o2I 

~~2~  (<t>  + <P  o 


. _ 2 ito(i-k) 

>tk  = a£am  ’ °se 


(15) 


(16) 


Let  us  examine  the  effect  of  R on  the  two-dimensional  subspace  spanned  by 
a and  a: 


(4>a)* ' kfj  aAak  “ ^s a 


■ Lo^a 
8 


(*ah  ” l*i\\  " (rVa<i 


<(>a  - (Ea£) 


IT 


(17) 


Thus , the  subspace  is  invariant  (is  mapped  into  itself)  under  Rg,  and  a matrix 
representation  of  Rg  in  that  subspace  is 


(a,  3) 


Lo2 

8 

Z\ 

Lo 

t 

(18) 
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Note  that  the  eigenvalues  of  the  matrix  (18)  will  also  be  eigenvalues 


of  R . 


The  characteristic  equation  is  quadratic  with  roots 

o2 

\ « ~Y~  (L  ± |z|) 


where  |z|  - |£a^ | 


slnoiL 

sinai 


(19) 


Since  the  columns  of  R are  linear  combinations  of  a and  a.  its  rank  is  at 

s 

most  two,  and  the  other  L-2  eigenvalues  must  be  zero.  Hence  L-2  eigenvalues 

2 2 
of  R are  Oq  and  the  remaining  two  are  (A+)  + Oq.  Furthermore,  it  will  be 

shown  in  the  next  subsection  that,  for  the  ALE  of  Figure  lb,  W*  lies  in  the 

subspace  spanned  by  a and  a;  and,  thus,  that  the  "growth"  curves  only  depend 

2 

on  the  eigenvalues  A+  + 0q.  We  also  note  that,  for  o>  bounded  away  from  0 
2 

and  tt , A+  ~ Og  L/2,  consequently,  the  learning  curves  exhibit  a single  time 
constant , 


T 


4p(o„ 


_L__ 

2 


°o> 


(20) 


This  is  to  be  contrasted  with  the  estimate  of  equation  (13) 

T . - 1_ 

max  4pA  . 

min 

1 _1 

" ^ o2 
o 

L a2 

- T( + 1) 

2 o 

o 


(21) 


J 
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which,  although  much  larger  than  t except  at  very  nail  signal-to-nolse 
ratios,  will  not  be  observed  when  the  Initial  state  0(0)  is  zero  (the 
mode  corresponding  to  is  orthogonal  to  9(0)). 

M Sinusoids 

It  is  easy  to  generalize  the  foregoing  technique  to  the  case  of  M 
sinusoids.  Define  M vectors  aS,  each  of  dimension  L by 

iu>  £ 

<a\  - cs  e 3 s * L M 

£ - 1,  ....  L (22) 

where  u may  be  negative  in  order  to  include  negative  frequency  components. 

s 

Define  the  LxL  matrix 

m 

a‘  “k 

s»l 

_ 2 itt>  U-k)  (23) 

■ £ 0 e s 

s 

We  now  look  for  an  eigenvector  of  the  form  E - with  eigenvalue  X 


<**>i  - Z , vK< 

r,s,k 

- AE, 

- Aly^a*  • 

r * 


(24) 
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I 


! 

* 


4 • 


Then,  since  the  vectors  a are  linearly  independent,  equation  (24)  implies 


ZB  y - Xy  . 

rs  s r 
s 


(26) 


• g 

Note  that  the  M-dimensional  subspace  spanned  by  the  vectors  a is  invariant 
under  <)>,  that  <]>  is  represented  by  the  matrix  B with  respect  to  that  basis, 
and  that  B has  rank  M.  Since  <|>  also  has  rank  M,  its  remaining  L-M  eigen- 
values are  zero. 


Also,  for  the  ALE  of  Figure  lb, 

d(£)  - Zo  ei(ws£+  V + o N(£) 
s o 

s 

x(£)  - d(£)  + o N(£) 
o 

and 

lu)8(£+6-l) 

P(£)  - Zoge  , (27) 

s 

a linear  combination  of  the  aS  (see  [3]  for  details).  Thus,  W*  « R *P  = 

-1  s 

(I  + B)  P is  also  contained  in  the  subspace  of  the  a . Furthermore,  since 

<t>  is  Hermition,  those  eigenvectors  of  <J>  not  belonging  to  the  matrix  B are 

orthogonal  to  the  above  subspace  and  hence  to  W*.  We  conclude  that  at  most, 

M eigenvalues  are  pertinent  to  the  learning  curves,  and  they  are  solutions 

to  equation  (26). 


I 


t 
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Example  1 


, 1 °s  iwi.  2 °s  -iwi  2 , N 

Let  a = — a ; a = — e . Then  <j>  » o cos  o>  (i  - k)  and 


1 & 1 A 

B Is  the  matrix  given  in  (18), 


Kik  s 


Example  2 


1 °1  ioo£  2 °2  iui,  3 _1  4 _2 

a‘‘^e  ia‘-^e 


<f>  - OjCosu)^(£-k)  + c^cosu^  (H— k) 


Let  A - u>i~u)2  and  w “ ^1^2*  T*ien  B is  given  W 


o^Ee 


„ „ r -iAk  .2 
Oj02Ze  Lo2 


_2_  "2iu)lk  -iwk 

OjLe  a,a2Ee 


-iu)k  2V  “2ia)2k 
a^Ee  a2Ee 


„2,.  21U),, 

o^Ee  lk 


ala2Ze 


a]a2 Ee 


2tu,2k 

a2Ee 


OjOjEe 


a^Ee 


L . sin  L 

We  note  that  | E e x | ■ | | which  is  much  less  than  L for  large 

k-1  sin 

2 

L and  —jr~  < x < 2n  - 2/L.  Thus,  provided 


17 


(29) 


Che  off-diagonal  matrices  (as  indicated  by  the  dotted  lines)  of  the  entire 
matrix  (28)  may  be  neglected  (i.e.,  the  positive  and  negative  frequency 
components  uncouple).  The  remaining  2x2  matrices  are  identical  and  have 
eigenvalues 


v- 


_L_ 

2 


2 2 
°l  + a2 


± 4 


/.2,_2 


L‘(o‘  - 


o\)2  + 


4|Z 


2 2 2 
al°2 


sin  L 


where  \z\  ■ (- 


_A_ 

2 


sin 


_A_ 

2 


(30) 


For  AL  » 1;  i.e.,  a large  separation  of  the  sinusoids  compared  to 
the  filter  resolution. 


A 


+ 


A 


AL  » 1 


On  the  other  hand,  when  AL  <<  1 


A 


+ 


C2> 


AL  « 1 


A 


2 2 
°1°2 


A* 


(31) 


(32) 
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and,  if  = o^,  A_  reduces  to 

, «?’  .V 


AL  « 1 


2 2 

°l  = a2  • (33) 

w1+w2 

Note  that  (33)  could  have  been  obtained  from  (19)  by  translating  2 t0 

A Vw2 

zero  and  substituting  — = 2 for  single  frequency  u>. 


For  two  equal  amplitude  sinusoids  close  in  frequency,  expressions  (32) 

and  (33)  are  valid.  In  that  case,  even  in  the  signal  subspace,  there  may 

2 

be  a large  disparity  in  the  eigenvalues  of  R = <f>  + (J  I: 


“ cT(l  + qL) 


& - 


where 


is  the  signal-to-noise  ratio.  However,  A^  or  will  influence  the  learning 
curves  only  if  the  corresponding  scalar  product,  E^’W*  or  E2*W*  is  relatively 
large  (equations  (10)  and  (11)). 


The  projections  of  the  eigenvectors  on  W*  are  computed,  up  to  a constant 
factor,  in  Appendix  A with  the  result 
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| E1  *W*  | 2 ~ (2  + ^L3-  )2 


| E2 -W* | 2 ~ q2A2LA 


For  large  q or  relatively  large  L (note  that  LA  « 1 and  L T » 1 can  be 
satisfied  simultaneously),  the  second  mode  will  have  a significant  effect 
on  the  weight  learning  curve.  Its  time  constant 


2n  L3A2  . 

°o(1  + q 48  } 


(1  j qL)  T 

n + L3*2  , 1 

(1  + q 48  } 


i .2.2  1 

1 L A 
qL  48 


is  in  general  much  longer  than 


To  study  the  output  learning  curve,  we  examine  the  relative  magnitudes  of 


A L | E1  *W* | 2~  02(1  + qL)  (2  + -a~ — )2 


X I E2-W* I ~ CT2(1  + q 


3 2 

I/V  v 2A 2,  4 

— TS— > 1“ 


°oqL(1  + q ‘ 


I 


f 


From  (38)  it  is  seen  that  the  second  mode  is  relevant  only  when 

2 3 1 

qA  L _>  1 . If  this  condition,  somewhat  stronger  than  (36),  holds,  the  l 

i 


second  mode  will  slow  the  convergence  of  the  output  as  well  as  that  of 
the  weights.  These  results  have  been  confirmed  experimentally  [11]. 


I 

I 

i 


IV.  THEORY  - GENERAL  SPECTRA 

For  the  infinite  case  (L  = °°) , determining  the  eigenvalues  and  eigen- 
vectors of  the  autocorrelation  matrix  reduces  to  finding  its  spectrum.  In 
addition,  if  the  transfer  function  of  the  optimal  filter  is  known,  it  is  not 
difficult  to  describe  or  approximate  the  learning  curves.  In  this  section, 
these  properties  are  derived,  and  it  is  shown  that  for  large  L the  convergence 
characteristics  of  the  LMS  adaptive  filter  are  approximated  by  those  for 
L = °°.  A more  detailed  treatment  of  the  mathematics  may  be  found  in 
reference  [5]. 

Let  the  input  process  x(i)  be  stationary,  and  the  z-transform  [6] 
of  its  autocorrelation  function  given  by 


k-f. 


G(z)  = Z E{x(£)x(k)}z 
S,-k“-°° 


where  E denotes  expectation.  Assume  that  G(z)  is  continuous  and  has  no  zeroes 
or  poles  on  the  unit  circle.  (Note  that  the  sinusoids  of  the  previous  section 
are  a singular  case.)  Then  the  power  spectrum  of  x may  be  written 


S(uO  - G(e1W) 


(40) 
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and  its  correlation  matrix  admits  the  representation 


V = 4>o-io 


E(x(d)x(k)] 


1 r x i(«.-k)0), 

/ S(u>)e  dw 


2tt 


Assume  further,  that  the  vector  P possesses  the  same  properties 


, 1 ; 0 . , idu), 

= ~2 T 1 Sxd(w)e  d(i> 
-7T 


2tt 


1 r idu), 

J E(xd)e  do) 


d = 0,  . . 


-TT 


For  the  ALE,  if  x is  a signal  process  plus  white  noise, 


1 r r,  f \ -i<5<±>  iu)d, 

h = IT  1 Ss(u))e  e du) 

-7T 


The  finite  LxL  matrix  of  equation  (2)  may  be  written 


Rtk  - i (t-k) 


d , k = 0 , . . .,  L-l 


We  introduce  the  infinite  dimensional  vector  norm 


|y 1 1 2 - i vt2 

d-0 


(41) 


(42) 


(43) 


(44) 


(45) 
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A. 


All  L-dimensional  vectors  may  be  considered  as  embedded  in  this  space  by 
setting  those  components  with  indices  greater  than  L equal  to  zero. 


It  is  shown  in  Appendix  B that  W*,  the  solution  of  (5)  considered  as 
a function  of  L,  converges  to  a bounded  solution  f of  the  infinite  Wiener- 
Hopf  equation  as  L i.e., 


i <Ka-k)f.  = pc 


n = o,  . . . , 


(46) 


where 


k=0 


lim  | [ W*  - f | | =0 
L-*» 


We  mention  here  that  the  matrices  R and  <j)  are  Toeplitz  [5]  and,  as  such, 
possess  the  following  properties  (cf  equations  (B-4)  and  (B-9)):  Let  m and 
M be  the  minimum  and  maximum  of  |S(u>)|  (non-zero  and  finite  by  the  assumptions 


on  G(z)  );  and  let  A , v = 1,  . . . , L be  the  eigenvalues  of  R.  Then 


m < A < M 
— v — 


where 


lim  A 
L-*» 


min 


m 


lim  A = M 
L—  maX 


(47) 

(48) 


r 


In  [5]  it  is  also  shown  that  the  matrix  R is  approximated  (in  a Hilbert- 

A 

Schmidt  norm)  by  the  matrix  R , 

„ L 


Y-l 


(49) 


which  has  eigenvalues  S(— ■)  and  normalized  eigenvectors  e2,lvl/L//r 
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Thus,  if  E are  the  eigenvectors  of  R,  we  have  "approximately" 


A - 

v 


) 


(50) 


(EV) 


£ 


1 

/L 


e 


2iriv£ 

/L 


£ - 0 , . . . , L- 1 . (51) 


It  is,  therefore,  not  surprising  that  the  following  expressions  (derived 
in  Appendix  B)  hold  for  the  learning  curves  (10)  and  (11).  Let 


F(w) 


OO 


£ 

£-0 


e 


then 


(52) 


7T 

lim  | |w*  - »(k)  | | 2 -•=£-/  (l-2yS(w)  ) 2k  | F (to)  1 2dw  (53) 

L-k»  * 

and 

lim  SOO-S*  = /(l-2pS(&))  ) 2kS (to)  | F (to)  | 2dto  . (54) 

L-w  ZTI 

Note  that,  heuristically,  from  equations  (50)  and  (51), 

9 , L-l  2tt1v£  o , o 

|EV*f  | 2 If,  e'  L I2  ~ “f-  |F(-^-)|2,  and  that  (10)  and 

(11)  correspond  to  approximating  sums  of  the  integrals  in  (53)  and  (54). 

The  factor  corresponds  to  . Thus,  a large  value  of  F(uj)  will  not 
be  significant  unless  it  has  some  bandwidth.  It  is  the  integrated  power 
over  a given  spectral  region  that  is  relevant. 
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In  summary,  for  large  L, 


the  eigenvectors 

are  indexed  by 

the  eigenvalue 

corresponds  to 

S(w), 

and  |W**E  1 2 

corresponds  to 

|F(u) 

I 2_dw_ 
1 2tt 


The  learning  curves  for  the  weight  vector  and  output  correspond  to  a sum  of 
modes  given  by 


and 


|F(u))|2(l-2uS(w))2k  ■~r 


|F(w)|2S(uO(l-2uSM)2k  -|j-t 


(55) 


respectively,  where  the  spectrum  has  been  divided  into  regions  of  width 
Aw.  The  time  constant  of  mode  w is  thus 

t(o>)  = (4yS (to)  } * . (56) 

Finally,  it  follows  from  (47)  that  the  conditioning  number  approaches 


max 


min 


max  S(w) 

w 

min  S(w) 

W 


(57) 


and  the  eigenvalues  of  R satisfy 


min  S(w)  < < max  S(w) 

w W 


(58) 
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where 


It  may  be  easily  shown  (cf  [8])  by  substituting  (66)  in  (61)  that 


-6  iu>0 

b * e e 

,2 

s cosh  g - cosh  ot 
2 sinh  a 


Let  the  signal-to-noise  ratio  be  given  by 


For  a « 1 (relatively  narrowband)  and  qa  « 1,  we  have  the  approximation 


6 * va  + 2aq  . 


Let  us  first  compute  the  conditioning  number  via  expression  (57). 
From  (60),  on  the  unit  circle 


2 1-2 

S (u>)  - a 

8 8 “2a  ~ -a  , \ . i 

e - 2e  cos(u>-aiQ)  + 1 


Also,  since 


S(id)  • O + S (la) , 

8 8 


it  is  clear  that  the  maximum  of  S occurs  at  u>  m and  the  minimum  at 


w * 0)q  + IT. 
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Thus, 


•<  . 


, _2  1 + e“a  . 2 

X ~ 0 + o _ 

max  s -a  0 

1 - e 


2 1 - e~a  2 

Amin  s , -a  °0 


(68) 


1 + e 


For  a natrowband  (compared  to  Nyquist)  signal  u « 1 and 


max 


1 + 


a 


min  1 + 


If  qa  is  also  small. 


otq 


a « l 


(69) 


max 


min 


- 1 + 


Ja_ 

a 


qa  « 1 


(70) 


It  is  easily  shown  by  classical  techniques  ([9],  [10])  that  the  transfer 
function  of  the  optimal  filter  is  given  by 


F(u>)  - 


ce 


iuxS 


1 - be 


-iui 


ce 


id;6 


1 - e‘P  e'^V 


(71) 


I 


l 

I 


! 


I 


¥ 


T 


where  c is  a constant.  Hence 


|F(w) 


1 - 2e  ^cos(uj-gJq)  + e ^ 


(72) 


Also,  from  (61)  and  (62) 


G(z) 


2 b~*  (z-b)(z~1-b) 

0 -'1  , w "I  -x 

a (z-a)(z  -a) 


->  S(w) 


-8 

e3  1 - 2e  pcos(<jJ-(Dq)  + e 

ea  1 - 2e  acos(u)-(i)g)  + e 

cosh8  - cos(a>-a)Q) 

cosha  - cos(a)-a)Q) 


-28 

-2a 


(73) 


For  small  8,  the  function  (72)  is  very  sharp  and  only  the  region 
[u)q-8,  Wq+B]  influence  the  integrals  (53)  and  (54).  As  8 increases 

the  spectrum  of  F flattens  out  until  we  must  include  the  entire  interval 
[-n,  tt ] . Let  8'  = B-a.  To  approximate  the  learning  curves,  we  divide 
the  interval  ^ 28'  into  three  regions:  (u)Q  - a,  wQ  + a) , (wQ  - 2B'-a, 
u0  ~ a) » and  (wo  + a»  + 2^"+a^*  They  are  centered  at  u>q,  wq-8*  and 
u)q+8;  and  have  widths  (Aw)  j » 2a,  (Aw)2  - (Aw)3  * 28'  respectively.  This 
is  shown  diagrammatically  in  Figure  2. 


For  small  8 and  w,  cosh  8 ~ 
may  be  approximated  by 


+ 


and  cosw  ~ 1 - 


and  (73) 


30 


I; 

(j 


J 


Figure  2.  Diagram  of  a spectrum  partition. 


2 82  + (w-w  )2 

S(u)>  ~ a — r — - - B « 1 . (7 

u aS  (a-u>0)z 

2 

Likewise,  | F (oi)  | is  approximately 

|F(w)  | 2 ~ c2e0 — = jr  3 « 1 . (7 

P + (w-wQ) 

The  time  constants  for  two  of  the  regions  are  the  same  so  that 

2 

S(wn)  ~ o2  S(w.£p)  ~ an2  - j-V-  - (7 

a2  “ +e 

2 8 2 

(Aw) . | F (wn)  | 2 ~ 2 a -ce--  2 (Aw)  jF(w.+g)|2  « 4(3-0$  “M 

l u ^ D-  2g 


To  simplify  the  expressions,  we  will  assume  that  a « g (this  is  the 

case  except  at  extremely  low  signal-to-noise  ratios) . The  substitution  of 

(76)  into  (53)  and  (54)  then  yields  a sum  of  terms  of  the  form  (55).  They 

2 -B 

are,  up  to  the  factor  2c  e , 


ex 


JL_x2k 


(1  - 4mo2  )2k 


filter: 


(7 


output : 


2 


(78) 


2 Rl  2k 

— (i  - 2po  + 

a 0 2 


2o, 


(1  - 4yaQ  ) 


2k 


In  the  above  approximation,  the  filter  exhibits  two  time  constants: 


1 

4yo*  8 


and 


(79) 


In  contrast,  since  the  slower  second  term  in  (78)  is  negligible  (for 

ot  « 8) , the  convergence  of  the  output  will  depend  essentially  on  only  one 

1 a ^ 

time  constant,  1,  = n — — r—  . Note  that  a closer  approximation  (more 

4po0  8Z 

modes)  to  equation  (54),  yields  a second  time  constant  at  “o±a: 

2 

T2  * ~ 4p~  (S  ^0  — 1 ~ T — ( "°~2"')  • Thus  , one  would  expect  the 

4po0  8 

logarithmic  slope  of  the  actual  learning  curve  for  the  output  to  vary 
between  tj  and  2t  during  the  period  of  significant  convergence. 

The  above  results  offer  the  following  interpretation.  Initially,  the 
transfer  function  of  the  filter  will  be  narrowly  centered  about  id^  (the 
modes  <d  for  id  close  to  (Dq  convergence  rapidly)  . With  time  the  filter 
widens  (the  slower  modes  converge) . On  the  other  hand , the  output  converges 
quickly,  since  for  a narrowband  signal,  there  is  little  error  reduction  as 
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the  filter  widens.  This  is  particularly  true  at  high  SNR  (g>>a);  the 
filter  can  afford  to  get  quite  wide  since  there  is  very  little  noise  to 
let  through.  At  low  SNR  the  optimal  filter  is  narrow  (g~a),  and  this 
"filter  growth"  will  not  usually  be  observed. 


The  same  considerations  hold  for  large  g ( g _>  1 ) although  the  approx- 
imation (74)  is  not  valid  and  must  be  replaced  by  (73).  For  g > 1, 

2 

|F  (m) | is  approximately  constant  (equation  72)),  all  modes  are  equally 
important,  and  convergence  will  be  limited  by  the  larger  time  constants. 
On  the  other  hand,  the  modes  of  the  output  learning  curve  fall  off  as 
S(w)(cf  equation  (55)),  and  only  those  corresponding  to  |w-wQ|_<2  will 
be  significant. 


A computer  simulation  of  the  ALE  was  run  using  pseudo-random  noise 
input  with  a correlation  matrix  given  by  equation  (59)  and  parameters, 

L*64,  Uq*1.5,  a “.2,  V-.0008,  and  g-.915.  The  results  were  then 

ensemble-averaged  to  obtain  W.  It  follows  from  (63)  that  SNR  ■ — | ■ 3.3  dB. 

°0 

A plot  of  equation  (56),  time  constant  versus  frequency,  appears  in  Fig- 
ure 3.  It  is  seen  that  the  fastest  mode  is  about  t ■ 30  and  the  slowest, 


t*  525.  The  conditioning  number  is  thus  17.5. 
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U)  - U>Q  I 

Figure  3.  Plot  of  time  constants  of  the  various  modes  ) 

of  convergence  for  the  weight  vector  (equation  (56)): 
a = 0.2,  (5  = 0.915,  n = 0.008,  = 0.5. 

The  actual  learning  curves  for  the  simulation  appear  in  Figures  4 
and  5.  They  are  compared  with  those  computed  from  equations  (53)  and  (54). 

(It  was  found  that  five  modes  provided  a reasonable  approximation  to  the 
integrals,  and  that  anything  over  ten  modes  was  virtually  indistinguishable 
from  the  plotted  curves.)  The  output  curves  in  Figure  5 are  in  excellent 
agreement.  There  is  a very  slight  discrepancy  in  the  weight  vector  learn- 
ing curves  of  Figure  4,  which  may  be  attributed  to  the  initial  stages  of 
convergence.  The  independence  assumption  on  which  equation  (1)  is  based 
is  not  valid  for  small  k.  The  ultimate  effect  is  the  same  as  if  the  initial 
state  of  the  ALE,  w(0),  were  not  quite  0.  In  contrast.  Figure  5 was  con- 
structed by  matching  conditions  at  k - 50. 
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It 
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Figure  4.  Learning  curves  for  the  weight 
vector.  Solid  Line  = computer  simula- 
tion of  ALE,  averaged  over  a 20-point 
ensemble.  Dotted  Line  = theoretical 
curve  computed  from  equation  (S3). 


Figure  5.  Learning  curves  for  the  output. 

■ Solid  Line  = theoretical  computed  from 
equation  (54).  Dots  = computer  simulation 
|of  ALE,  averaged  over  a 20-point  ensemble. 


J 


> 
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Since  B * .915  is  rather  large,  one  would  expect  the  overall  con- 
vergence of  the  weight  vector  to  depend  on  the  slowest  mode.  From  Fig- 
ure 3,  this  is  about  500,  which  corresponds  quite  well  with  the  simulation 
in  Figure  4.  According  to  theory,  the  output  should  converge  faster,  de- 
pending mostly  on  those  modes  for  which  |u)-u)g|_<a“  .2.  From  Figure  3, 
this  value  is  t - 55  which  agrees  with  the  e 1 downpoint  of  Figure  5. 
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Note  that  the  scales  of  Figures  A and  5 were  chosen  to  give  the  same 
magnitude  for  k=0,  and  the  more  rapid  convergence  of  the  output  curve 
is  readily  apparent. 


Figure  6 plots  the  spectrum  at  five  different  times;  k = 25,  50,  100, 
250,  and  500;  plus  the  spectrum  of  the  optimal  filter  W*.  As  expected,  the 
filter  grows  and  widens  with  time.  A more  dramatic  example  is  pictured  in 

Figure  7.  In  this  case,  the  simulation  was  run  with  a real  input  signal 

2 _a I & I 2 

<Jl(£)  =0  e 1 1 coso)qj  + Oq  A(£)  at  a signal-to-noise  ratio  of 
2 2 

0 /2o.  - 15  dB. 

s 0 


u)  • 2iri 

Figure  6.  Power  spectrum  of  W(k)  for  five  dif- 
ferent times  (k  = 25,  50,  100,  250,  and  500), 
and  the  optimal  weight  vector  W*(k=<»).  The 
input  correlation  function  of  the  ALE  was  given 
by  equation  (59)  with  a = 0.2,  P = 0.915,  L = 64, 
u>0  =1.5,  and  fi  = 0.0008.  The  results  were 
averaged  over  a 100-point  ensemble. 


36 


0.24, 


0 6.28 

OJ  - 2irf 


Figure  7.  Power  spectrum  of  W(k)  at  three  dif- 
ferent times  (k  = 160,  320,  and  640).  The  input 
correlation  function  was  e~ai®l^  cos  a>08  + 
og  with  a = 0.2, 0 = 2.01 , L = 64,  oo0  = 1.5, 
and  n = 0.0016.  The  results  were  averaged 
over  a 200-point  ensemble. 


Finally,  it  should  be  recalled  that  the  theory  of  the  previous  sec- 
tion is  an  approximation  for  large  L (L = 64  in  the  above  simulation).  For 
the  case  of  a signal  pole  spectrum,  we  can  explicitly  derive  the  asymptotic 
dependence  of  the  eigenvalues  on  L.  From  [5],  the  eigenvalues  of  Rg 
(matrix  (59)  with  oQ - 0)  satisfy 


“ 5 Al  ; s(-E7r)  i A2  i s(-wT>  i 


(80) 
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Let  x = cos"  , x = cos(6  + 73-r) , and  r = e . Then  the  error  in 

I V Z V Lt  1 

estimating  A is  bounded  by 


1 - r 


1 - r 


1 + r - 2rx, 


1 + r - 2rx„ 


(1-r2) 


2r  (x^-x^) 


(l+r2-2rx1)(l+r2-2rx2) 


(x  -x  )S  (x  )S  (x  );  a < 1 


a '“1  2 s I s 2 


But  Ss(Xl)  = Ay  - Ss(x2);  |xrx2|  ~ -j- 


. cosO  . Thus 
L V 


error  < -dr  vXl 


(81) 


Except  for  very  large  eigenvalues,  the  percentage  error  will  be  small  when 
L a » 1;  i.e.,  provided  the  signal  is  wider  than  the  ALE  resolution.  For 
the  other  extreme,  L a « 1,  the  narrowband  signal  will  appear  essentially 
sinusoid. 


J 


t 

1 


1 

1 

j 
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VI.  CONCLUSION 


We  have  examined  the  convergence  of  LMS  adaptive  filters  as  reflected 
by  the  weight  vector  and  filter  output.  These  curves  are  a sum  of  modes 
with  decay  constants  equal  to  the  eigenvalues  of  the  input  correlation 
matrix.  It  was  shown  that,  for  zero  initial  conditions,  the  importance  of 
a particular  mode  depends  on  the  projection  of  its  eigenvector  onto  the  optimal 
filter  W*.  For  example,  in  the  case  of  the  adaptive  line  enhancer  with  an 
input  of  M sinusoids  in  white  noise,  only  the  M-dimensional  "signal  subspace" 
is  relevant  to  convergence  time. 

One  and  two  sinusoids  were  treated  in  detail.  It  was  shown  that  a 
single  real  sinusoid  (two  complex)  has  two  pertinent  eigenvalues  (M  equals 
2),  and  that  they  are  approximately  equal.  Consequently,  convergence  is  often 
much  quicker  than  would  be  expected  from  an  examination  of  all  the  eigenvalues. 
The  behavior  for  two  sinusoids  (four  complex)  depends  on  the  separation  of 
their  frequencies.  If  they  are  relatively  close,  there  can  be  a large 
disparity  in  the  eigenvalues  even  in  the  signal  subspace  (equation  (37)). 

The  associated  eigenvectors  were  examined,  and  it  was  found  that  there  exist 
circumstances  under  which  the  slower  mode  may  dominate  the  learning  curve. 


A simple  correspondence  may  be  set  up,  for  large  filter  lengths, 
between  the  discrete  and  continuous  cases.  Indexed  by  frequency,  the 
eigenvalues  of  the  correlation  matrix  correspond  to  the  magnitude  of  the 
power  spectrum,  and  the  projections  of  their  eigenvectors  on  W*  correspond 
to  the  magnitude  of  the  filter  transfer  function.  Obvious  though  this  relation- 
ship may  be,  it  provides  a powerful  means  of  approximating  the  LMS  learning 
curves.  A rough  knowledge  of  the  input  spectrum  suffices  to  evaluate  the 
conditioning  number  and  thus  set  bounds  on  convergence  (equation  57)  ).  A 
small  amount  of  additional  analysis  yields  approximations  to  the  learning 
curves  (equations  (53)  and  (54)  ). 

As  an  example,  we  treated  the  case  of  a single  pole  input  spectrum.  / 

Simple  expressions  relating  bandwidth,  SNR,  and  approximate  convergence 
times  were  derived  (equation  (79)  ).  In  general,  the  convergence  of  the 
output  is  faster  than  that  of  the  weight  vector.  This  is  particularly 
noticeable  at  high  signal-to-noise  ratios.  It  was  also  observed  that  the  filter 
is  initially  narrow  and  widens  with  time  (during  convergence).  The  more 
exact  expressions  for  the  learning  curves,  equations  (53)  and  (54)  were  then 
evaluated  numerically  and  proved  in  excellent  agreement  with  a computer 
simulation. 

The  techniques  which  we  have  developed  often  alleviate  the  need  for 
finding  the  eigenvalues  of  large  matrices  in  order  to  analyze  the  learning 
curves.  In  addition,  they  provide  a physical  context,  the  signal  spectrum, 
in  which  to  evaluate  convergence.  Thus,  it  is  possible,  with  varying  degrees 
of  accuracy  dependent  on  knowledge  of  the  input  spectrum,  to  predict  the 
convergence  behavior  of  the  system  in  question. 
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which  implies 


L + bZ  = 2X  , 


(A-5) 


and , hence , 


b 


+ 


1 - 


iAL 

2 


(A-6) 


We  now  calculate  W*.  From  equation  (27),  with  a.  * a_  , <$  * 1,  and 

l1  1 

0)^  ~ (1)2,  P in  the  current  basis  is  proportional  to  (1).  Also,  R is  given 
by 


R 


°0  + 


ofB 


I 

(A- 7) 


where 


2 


Define 


so  that  E 


|E  ! | . Then,  up  to  a constant  factor 


3»2. 


E+  • W* | 2 ~ (2  + L ^ 3~) 


(A-9) 


(A- 10) 


|e"  • w*|2 

where  AL  <<  1 . 
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APPENDIX  B 


For  mathematical  background  and  material  related  to  the  subject 
matter  in  this  appendix,  the  reader  is  referred  to  references  [5]  - [8]. 
We  proceed  to  prove  relation  (46). 


Define  the  strong  matrix  norm  by 
IUII  = SUP 


I 

Ax 

1* 

(B—  1 ) 


It  follows  from  this  definition  [7]  that  the  norm  of  a finite  dimensional 
Hermit ian  matrix  B is  given  by 


I |B| | - A 


(B-2) 


max 


maximum  eigenvalue  of  B 


Let  y be  a vector  in  r,  i.e.,  Z y0  < 00 , then  there  exists  Y(w),  the 

£-0  * 

inverse  transform  of  y,  such  that 


1 Vr  v , . iuiJ4 

h m IT  U Y(w)e  dw 


(B-3) 


The  convolution  theorem  for  Fourier  series  and  Plancherel's  theorem  imply 


I 


k-0 


«,kyk^'  " 2.*0  (<^^kyk) 


fv  |Y(o))  | 2 |S(o.)  |2  du> 


(B-4) 


Thus 


sup  | |<M  |/|  I y 1 1 

y 


max  [ S (hi)  | 

0) 


(B-5) 


Let  R be  the  operator  <j>  restricted  to  the  L-dimensional  subspace  of 
the  first  L basis  vectors: 


L L_1 

(R  y)5  = I 4»5.  y, 


a ' „ *£k  Jk 
k-0 


SL  < L-l 


l > L . 


(B-6) 


Thus,  R is  identical  to  R of  equation  (44).  If  we  define  the  projection 
operator  QL  by 


(Q  y)^  * y j. 


H < L-l 


H > L 


(B-7) 


then 


rl  - ql  (j»  ql 
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Thus , 


|KL[ I < IIQlII  11011  IIQlI 


1011  • 


(B-8) 


Similar  expressions  hold  for  the  infinum  of  R and  (|>: 


X • min 
min  y 


•W 


» inf  -LU&IL 

y I ! Q v 1 1 


> inf 


-» 


min  |S(u))|  (from  (B-4)  ) 


(B-9) 


It  was  assumed  that  G(z)  has  no  zeroes  on  the  unit  circle;  thus,  we  may 
write  min  |s(uj)|  = a > 0.  Then 


(RL)  * | | - max  eigenvalue  (RL)  * 


(B- 10) 
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Also,  it  follows  from  (B-5)  and  (B-9)  (cf  [8]  that  <f>  exists  and  is  bounded. 


Finally,  we  note  that  R converges  weakly  to  (}>;  i.e.,  that 


1 im  | | (R  -<j))y  | | = 0 
L-+“ 


(B-ll) 


This  follows  from 


I (RL-4>)  y 1 1 - I lQL<>QLy-<t>y  I 


|QL«|>QL-<t»y  + (QL<J»-<t>)y  I I 


< | |<|>(QL-I)y|  I + 1 1 (QL-I)<t>y | 


1 1 141 1 1 1 (QL-I)y 1 1 + I |(QL-I)<j>y| 


( B- 1 2 ) 


and  the  fact  that 


I Ql- i ) y 1 I2  - £ y\ 

n t ~ 


converges  to  zero  as  L -►  00 


Let  W be  the  solution  of  the  L-dimensional  Wiener-Hopf  equation  (5) 


- (RL)_1QLP 


(B- 13) 


where  P is  defined  by  (42).  Let  f - <j>  *P  and  PL  - QLP.  Then 
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Also, 


1 


I 


f 


* 


1 

i 


I i AkVL|  I < I (A^  - Akf  I I + | |Akf  - Akf  I I + i |Akf  I I . 


Hence , 

lim  | |AkWL| I2  = | |Akf | |2  . ( B- 1 7 ) 

L-*°° 


But  the  convolution  theorem  implies 

(Ak)  jm  = ~2t\~  / (l-2uS(a))Jke1W(^"m)  dw  (B-18) 

-7 r 

and  Plancherel's  theorem  gives 

7T 

| | Akf  | | 2 = ~ 2^—  / (l-2yS(cj))2k  | F(oo)  | 2doj  (B- 19) 

— TT 

with  F (oj)  defined  as  in  (52).  Substitution  of  (8)  and  (B-19)  into  (3-17) 

yields  equation  (53).  A similar  calculation  using  (8)  and  (11)  gives  I 

equation  (54). 
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